Popularity-based selective replication in content delivery network

ABSTRACT

A client requests an object at a first server in a content delivery network (CDN), the request having been directed to said first server regardless of whether the first server has the requested object; When the first server does not have a copy of the requested object, it selectively replicating the requested object on the first server. The replicating is based at least in part on a measure of popularity of the requested object, wherein the requested object is not replicated to the first server when the measure of popularity of the requested object does not exceed a popularity threshold.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of co-pending U.S. patent applicationSer. No. 11/715,316, titled “Managed Object Replication And Delivery”filed Mar. 8, 2007, which is a continuation of co-pending U.S. patentapplication Ser. No. 10/073,938 titled “Managed Object Replication AndDelivery” filed Feb. 14, 2002, the disclosures of each of which areincorporated herein by reference in their entirety. This application isalso related to U.S. patent application Ser. No. ______, (attorneydocket no. 2711-0101), titled “Peer Server Handoff in Content DeliveryNetwork,” and filed on even date herewith, the entire disclosure ofwhich is incorporated herein by reference.

BACKGROUND

This invention relates in general to the field of computer networks.Particularly, aspects of this invention pertain to managed objectreplication and delivery over a network.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention are illustrated in theaccompanying drawings in which like references indicate similar orcorresponding elements and in which:

FIG. 1 is a high-level block diagram of a topology of the managed objectreplication and delivery method and system according to embodiments ofthe invention;

FIG. 2 is a high-level block diagram illustrating the data flows ofmanaged object replication and delivery method according to embodimentsof the invention;

FIGS. 3(a), 3(b) and 3(c) are a flow chart of the managed objectreplication and delivery method and the object purging method accordingto embodiments of the invention;

FIG. 4 is a flow chart of a popularity computation according toembodiments of the invention;

FIG. 5 is a flow chart of a replication scheme according to embodimentsof the invention;

FIG. 6 is a flow chart of a purge scheme according to embodiments of theinvention; and

FIG. 7 is a block diagram of the managed object replication and deliverysystem according to embodiments of the invention.

DETAILED DESCRIPTION

A typical content delivery network (CDN) operator deploys one or moreparent servers, hosting a plurality of objects, in a network and one ormore edge servers at the edge of the network to facilitate morecost-effective and efficient delivery of such objects to an end-user(client). End-users or client proxies that access customers' objects arecalled clients. Content provider companies, organizations, etc. thatsubscribe to the CDN service are referred to as customers. As usedherein, an object includes, without limitation, an audio file (such as,e.g., an MP3 (Motion Picture Experts Group-1 Layer 3) file and aRealNetworks, Inc. Real format file), a video file (such as an MPEGfile), an image file (such as, e.g., a BMP (bitmap) file or JPEG (JointPhotographic Experts) file) and any other software or data file orobject. It is typically desirable to serve objects from edge serversbecause the edge servers are typically closer (by various measures ofdistance) to end-users. For example, streaming content data from edgeservers saves parent-to-edge bandwidth. Furthermore, the less thedistance objects must travel can also mean reduced network congestionand packet losses, which can lead to a better experience for theend-user through faster response times and better quality of service.

It is typically not feasible to store all objects on the edge servers.The main difficulty is due to the fact that many such objects are verylarge (typically on the order of 10 MB (10,000,000 bytes)—in theneighborhood of 500 MB for movies). The storage and rack space requiredto accommodate often large and sometimes rarely requested objects atevery edge server can be cost prohibitive as the number of customersgrows and the number of their objects increases. It may not even bepossible to store a good working set of objects, for example a set ofobjects thought to be requested often and/or better suited to be servedfrom an edge server, because of the size and changing demand for objectsin the working set.

One obvious solution is to pre-populate edge servers with objects forwhich there will likely be a significant or high demand. However, it isdifficult to predict popularity and difficult to manage pre-populating.A related solution is to associate objects with two or more domainsdepending on popularity of the object, e.g., one domain for popularobjects (served from edge servers) and another domain for less popularobjects (served from parent servers). However, this requires some way topre-determine what objects are popular and what objects are less popularstatically, and build that popularity into the domain name of theobject. As with pre-populating, it is difficult to predict popularityand to manage assignment of domains based on such popularitydeterminations.

Other solutions fetch objects on demand. In such schemes, when arequested object is not available on a handling edge server, aconnection is made between a parent server having the requested objectand the handling edge server to fetch the requested object from theparent server. Such fetching suffers however from having to go throughthe parent path (the network path between the handling edge server andthe parent server with the object) whenever a client requests an objectthat is not already at the particular edge server.

Fetching a large object to the handling edge server through a parentpath can be slow. For example, there may be limited available bandwidthfrom the parent server to the handling edge server, i.e., sometimes theparent path has less bandwidth than even the network path from the edgeserver to the client (e.g., the “last mile” in a broadband network). Ifa parent server uses too much bandwidth copying an object to an edgeserver, this can create congestion at that parent server. If storagefill bandwidth is matched to client bandwidth, it is difficult to handlea second, faster client and if fetch is done using a streaming protocol(for instance, the Real-Time Streaming Protocol (RTSP) and Real-TimeTransport Protocol (RTP) standards), the quality of the copy made can behurt due to lost packets (“thinning”).

Moreover, there may be an unreliable end-to-end parent path due tonetwork congestion. And, if a parent server has to preprocess an object(e.g., to generate an image at a specific bit rate) or is otherwise busywith other tasks, this may further slow its ability to serve the requestfor the object fast enough. For example, if a client requests a bit ratehigher than the parent-to-edge bit rate, delays will likely occur. Undersuch conditions, the parent server may fail, for example, to stream theobject in time or to maintain the stream of an object at a requested bitrate thereby causing a thinned object, i.e., an object with lowerquality due to lost packets in its transmission, to be populated at theedge server and delivered to subsequent clients requesting the sameobject.

Thus, it would be advantageous to populate edge servers with the mostpopular objects yet somehow serve the rest from parent servers with agoal to maximize the amount of object bits served from edge servers ofthe network. It would also be advantageous to populate edge servers by,for example, storage fill on demand when an object is popular enough,without having to make the end-user wait for such population. Therefore,it would be advantageous to provide a method and system for managedobject replication and delivery over a network.

According to embodiments of the invention, a method and system formanaged object replication and delivery over a network redirects,directly or indirectly, a client's request for an object that is notavailable at a best or optimal handling edge server of the network to aparent server of the network that has the requested object. So, wherethe requested object is not available at the handling edge server, theclient's request is redirected directly to the parent server that canprovide the requested object to the client or indirectly via one or moreparent servers to a parent server that can provide the requested objectto the client. The method and system further intelligently replicatesthe object to the edge server if the object is popular enough. Likewise,an object is removed from an edge server when the object is no longerpopular. All redirection and replication operations are preferablytransparent to the end-user and do not degrade the quality of service.Other embodiments of the invention are possible and some are describedhereafter.

So, for example, under the framework described herein, a request for astreaming object will be served by a handling edge server if thathandling edge server has a copy of that object. Otherwise, the requestis redirected, directly or indirectly, to a parent server for service ofthe requested streaming object to the client. If the requested streamingobject is popular, the object is replicated from a parent server thathas the requested streaming object to the handling edge server so thatthe handling edge server will serve the object from the edge of thenetwork when the object is requested in the future. If a streamingobject is no longer popular, the object is removed from an edge server.

As used herein, replication generally refers to the permanent and/orvolatile storage of an object in a server, particularly an edge serverand if applicable, a parent server. Accordingly, the term replicationwill be considered synonymous to storing, caching and copying. Intypical embodiments, replication of an object will usually refer totemporary storage of the object in an edge server and/or a parent serverfor an undefined duration.

A typical network for the managed object replication and delivery methodaccording to embodiments of the invention is illustrated in FIG. 1. Thenetwork 100 comprises one or more parent server sites 120 and one ormore edge server sites 130. The network also optionally has access toone or more origin server sites 110. The origin server sites aretypically owned and/or maintained by the network provider's customersfor storing and serving one or more objects. Each customer (contentprovider) may have its own origin server site. Furthermore, one or moreclients 140 access the network to request one or more objects. A parentserver site (or simply parent site or parent server) may comprise oneparent server or a cluster of parent servers. Likewise, an edge serversite (or simply edge site or edge server) may comprise one edge serveror a cluster of edge servers and an origin server site (or simply originsite or origin server) may comprise one origin server or a cluster oforigin servers. Typically, the network 100 is configured such thatservers in a cluster share a common storage. In any event, configurationdetails of the parent server site, edge server site, and the originserver site are not important to the present invention.

In the typical network, the parent servers and edge servers aremaintained by a network provider, wherein the parent servers areprimarily used for storing and managing one or more objects and edgeservers are primarily used for serving objects to clients. In someembodiments, all the objects are retrieved from origin servers andstored over one or more parent servers before any end-users can accesseach such object as the object is stored on the parent servers.Accordingly, in these embodiments, the origin servers play nosignificant role in the managed object replication and delivery methodexcept to supply new and/or updated objects for storage on the parentservers. Moreover, only the parent servers communicate with the originservers. In other embodiments, each requested object is replicated fromone or more origin servers to one or more parent servers (and/or one ormore edge servers) when the requested object becomes popular (asdescribed in more detail below). In these embodiments, the originservers play a more significant role in the managed object replicationand delivery method to supply objects to parent and/or edge servers whenrequested. So, in these embodiments, the origin servers and parentservers communicate between each other and the origin servers andclients may also communicate between each other. In all of theseembodiments, the communications relationships between origin servers andparent servers may be one-to-one, one-to-many or many-to-many.

Further, as shown in FIG. 1, the parent servers and edge serverscommunicate between each other, edge servers and clients communicatebetween each other and parent servers and clients communicate betweeneach other. While in embodiments, as shown in FIG. 1, the edge servershave a one-to-one or one-to-many communications relationship with parentservers, edge servers may also have many-to-many communicationsrelationships with parent servers. As discussed in more detail below,the edge servers act as the primary source of serving objects but if arequested object is not available at the edge server a parent serverthat has the requested object will serve the requested object to theclients. Also, FIG. 1 shows a single layer or level of parent serversand origin servers. As will be apparent to those skilled in the art,more than one layer or level of parent servers and/or origin servers maybe used.

According to embodiments of the invention and referring to FIGS. 2,3(a), 3(b) and 3(c), the method of managed object replication anddelivery and the method of object purging is depicted. FIG. 2 depictsembodiments of the method in relation to a portion of the network 100,an origin server 110 and a client 140 as shown in FIG. 1. FIGS. 3(a),3(b) and 3(c) depict embodiments of the method in flowchart form.

Initially, the method of managed object replication and delivery directs(at 200, 300) a client, requesting one or more objects, to an edgeserver in the network, whether or not the edge server has the requestedobject(s). Preferably, the client is directed to an optimal edge server,e.g., based on network traffic conditions and server load. As will beapparent to those skilled in the art, any number of currently known orfuture developed mechanisms may be used to select a best or optimal edgeserver. Determination of a best or optimal edge server preferablyincludes selection of an edge server most suitable for delivery of oneor more objects to the client according to any number of currently knownor future developed algorithms. For example, determination of a best oroptimal edge server may be performed based on the likelihood of a copyof the requested object(s) being available at the candidate edge server,on the bandwidth between a candidate edge server and the client, on abest repeater selector (for example, as described in U.S. Pat. No.6,185,598) and/or on any number of other criteria.

The selected best or optimal edge server 130 determines (at 305) whetherthe edge server already has the requested object and, if so, serves (at205, 310) the object to the requesting client 140. For example, theselected edge server 130 will check its storage to determine whether therequested object is available and if so, may serve the object to therequesting client 140.

If the selected edge server does not have the requested object, a checkis initiated (at 315) for the edge server to determine whether therequested object is popular and if so, to replicate the popularrequested object to the edge server. In embodiments, the method depictedin FIG. 3(b) and discussed in more detail below is employed to determinewhether the requested object is popular and if so, to replicate thepopular requested object to the edge server.

In embodiments, the checking of whether the requested object is popularand replicating the popular requested object to the edge server may beperformed independently of one or more functions of the method ofmanaged object replication and delivery, such as the checking if aserver has the requested object and serving the requested object to theclient if the server has the requested object or redirecting the clientto a server that has the requested object (and serving the requestedobject to the client). Thus, in embodiments, the checking of whether therequested object is popular and replicating the popular object to theedge server may be performed in parallel with or before the performanceof certain functions of the method of managed object replication anddelivery such as the checking if a server has the requested object andserving the requested object to the client if the server has therequested object or redirecting the client to a server that has therequested object (and serving the requested object to the client).Advantageously, should the checking, redirecting and serving of therequested object fail, the checking of whether the requested object ispopular and replicating the popular object to the edge server can managethe continued delivery of objects to clients from edge servers.Similarly, if the checking of whether the requested object is popularand replicating the popular object to the edge server should fail, thechecking, redirecting and serving of the requested object can manage thecontinued delivery of objects from servers in the network.

Further, if the selected edge server does not have the requested object,the selected edge server directs (at 210, 320) the requesting client 140to a parent server 120. Preferably the client 140 is redirected to aparent server that has the requested object and is able to serve (at215, 345) the requested object to the client. If a parent server doesnot have (at 325) the requested object, a check is initiated (at 330)for the parent server to determine whether the requested object ispopular and if so, to replicate the popular requested object to theparent server. In embodiments, the method depicted in FIG. 3(b) anddiscussed in more detail below is employed to determine whether therequested object is popular and if so, to replicate the popularrequested object to the parent server. As with the check for the edgeserver, in embodiments, the checking of whether the requested object ispopular and replicating the popular requested object to the parentserver is performed independently of one or more functions of the methodof managed object replication and delivery such as the checking if aserver has the requested object and serving the requested object to theclient if the server has the requested object or redirecting the clientto a server that has the requested object (and serving the requestedobject to the client). Thus, in embodiments, the checking of whether therequested object is popular and replicating the popular requested objectto the parent server may be performed in parallel with or before one ormore functions of the method of managed object replication and deliverysuch as the checking if a server has the requested object and servingthe requested object to the client if the server has the requestedobject or redirecting the client to a server that has the requestedobject (and serving the requested object to the client).

Further, if a parent server does not have the requested object, theparent server could itself use a redirection technique recursively (at325, 335, 320) until a final parent server is reached that has therequested object. The parent server that has the requested object serves(at 215, 345) the object to the client. If the object is determined tobe unavailable (at 335) (from all parent servers), an error message isreturned (at 340) regarding the unavailability of the requested object.

As will be apparent to those skilled in the art, numerous methods areavailable to redirect a requesting client to another parent server,depending on the protocol(s) used to request the object. A handling edgeserver may request information from a database about to which parentserver the client should be redirected. In an implementation, the edgeserver might have a local database, populated by pushes of redirectiondata from one or more servers in the network. The edge server may alsosimply query one or more servers in the network to identify one or moreparent servers to which the client can be directed. When more than oneparent server responds, the edge server may redirect the client to theparent server that responds to the query first, the edge server mayredirect the client to the parent server that is topologically closestto the edge server in the network or the edge server may redirect theclient to the parent server that represents the best or optimalcandidate based on criteria such as network efficiency, bandwidthrequirement and/or cost. Alternatively, an edge server may always go todefault parent servers. Or, as discussed in relation to edge servers, abest or optimal parent server may be determined using any of thetechniques outlined above. Redirection may be performed by simplysending the request onto a parent server or returning redirectioninformation to the client for accessing the parent server. As will beapparent to those skilled in the art, any number of implementations maybe used to provide the redirection information to the handling edgeserver.

In other embodiments, where the parent servers collectively are notpopulated with all of the objects and the network has access to theorigin server of a requested object, the client may be redirected (at225, 320) to the origin server if the requested object is not availableon the parent servers. If the origin server has the requested object (at325), the origin server would serve (at 230, 345) the object directly tothe client (not shown in FIG. 1). Otherwise if the object is unavailable(at 335), an error message would be returned (at 340) regarding theunavailability of the requested object.

Referring to FIG. 3(b), when an edge and/or parent server determines (at350) that a requested object is popular (by some measure of popularity)but the edge and/or parent server does not have a copy of the object,the edge and/or parent server initiates a pull of the object to the edgeand/or parent server. So, for example, when the edge server determines(at 350) that a requested object is popular but the edge server does nothave a copy of the requested object, the edge server initiates thereplicating (at 220, 360) of the popular requested object to the edgeserver from a parent server that has the requested object. Similarly,for example, when a parent server 120 determines (at 350) that arequested object is popular but the parent server does not have a copyof the requested object, the parent server initiates the replicating (at240, 360) of the popular requested object to the parent server from anorigin server that has the requested object. Alternatively, a parentand/or origin server may receive information regarding objectpopularity, such as popularity determinations for objects or data aboutobject popularity, from one or more edge and/or parent servers and maypush popular objects to the edge and/or parent servers. So, for example,when the parent server determines (at 350) that a requested object ispopular at an edge server but the edge server does not have a copy ofthe requested object, the parent server may initiate the replicating (at220, 360) of the popular requested object to the edge server from theparent server. Similarly, for example, when the origin server determines(at 350) that a requested object is popular at a parent server but theparent server does not have a copy of the requested object, the originserver initiates the replicating (at 240, 360) of the popular requestedobject to the parent server from the origin server.

In some embodiments, if none of the parent servers has the requestedobject, the edge server initiates the replication (at 235, 360) of thepopular requested object to the edge server from the origin serverhaving the requested object (if the network has access to the originserver). Preferably, in each case, the replicated object is not servedor further replicated until the object has been completely copied to therespective server. Optionally, such replicating may be utilized by andbetween the parent servers themselves to facilitate the reduction of thetraffic to and from the origin server. Further, if the edge and/orparent server does not have adequate space for the popular requestedobject, one or more objects may be purged (at 355) from the edge and/orparent server to make space for the popular object. In embodiments, themethod depicted in FIG. 3(c) and discussed in more detail below isemployed to determine whether any object(s) in the edge and/or parentserver is no longer popular and if so, to delete the no longer popularobject(s) from the edge and/or parent server. Also, as will apparent tothose skilled in the art, servers other than the edge and/or parentserver for which an object is determined popular may perform the actualdetermination of whether an object is popular by using for example,popularity information provided by the handling edge and/or parentserver. The popularity determinations can then be used to initiatereplication (for example, pushing or pulling) of the object to the edgeand/or parent server for the which the object is determined popular.

Referring to FIG. 3(c), if an object in a server's storage is no longerpopular (at 365), the server may delete the object (at 370) from thestorage. For example, an edge server may delete (at 245, 370) anyobjects from the edge server's storage that are no longer popular.Similarly, a parent server may delete (at 250, 370) any objects from theparent server's storage that are no longer popular. As will be apparentto those skilled in the art, the determining of whether any object(s) inthe server's storage is no longer popular and if so, deleting the nolonger popular object(s) from the server's storage may be performedindependently of, for example in parallel with or before, one or morefunctions of the method of managed object replication and delivery. Inembodiments, the no longer popular objects are removed from edge serversand, if the no longer popular objects are hosted on an origin server,from parent servers.

Determining Popularity

Any number of techniques may be used to determine the popularity of anobject. Determining the popularity can be based on the number ofrequests. Popularity can also be based on the request rate. Popularobjects typically have higher request rates or higher number of requeststhan unpopular objects. Popularity can also be determined by trackingthe last X number of request times for an object and then use thedifference between the current time and these request times to calculatea running average for how often the object is requested. Determining thepopularity can also be gauged on the request rate for an object that isperhaps weighted for more recent requests for the object (which is apredictor that the object will be requested again). An exponential decaymethod and an artificial neural network could also be used to determinepopularity of an object.

According to some embodiments of a popularity computation and referringto FIG. 4, the popularity of an object is based on the request rate ofthe object and computed over a sliding time window in a discrete manner.In these embodiments, the variable I denotes the time interval overwhich the popularity of an object is measured. The time interval isdivided into N equal sub-intervals of duration I/N. As will be apparent,the time interval is not required to be equally divided and may insteadbe divided in other manners.

A linked list P of size N is created for each object. The value of Ndetermines the quality of approximation. The smaller the value of N, thecoarser the approximation. In some embodiments, the value of N is set to5.

The first element P[1] of the list records the number of requests thatarrived when the current time was within the first sub-interval, thesecond element P[2] records the number of requests that arrived when thecurrent time was within the 2nd interval, and so on. When a newsub-interval arrives, the list is rotated such that P[I] becomes P[I+1]except for P[N] which becomes P[1], so, e.g., P[1] becomes P[2], P[2]becomes P[3], and P[N] becomes P[1]. After the rotation, the new P[1] isreset to zero. Accordingly, only the end time of the first sub-intervalneeds to be recorded and compared against the current time to check ifthe list should be rotated. For each new request within thesub-interval, P[1] is simply incremented by 1. In this way, the arrivaltime of each request need not be recorded.

In preferred embodiments, the popularity of an object is simply the sumof all numbers in the list. To make the computation more efficient, thesum of P[2]+P[3]+ . . . +P[N] is stored in a register M. The popularitycan be then computed by adding P[1] to M. When a rotation occurs, thenew value of M becomes M+=P[1]−P[N]. The popularity of an object may bequeried constantly. So, to avoid the extra addition involved for eachsuch inquiry, the value of P[1] can be set to M after the rotation.Then, the value of P[1] is the popularity of the object.

The popularity computation algorithm may be summarized as follows. Thelinked list P of size N for an object, wherein each of P[1] . . . P[N]represents a time sub-interval, is initialized (at 400). The popularityM is also initialized (at 410). If there is a request for the objectwhile the current time is within the current time sub-interval (at 420),then the value of P[1] is incremented (at 430) by 1. If the current timeis within a new time sub-interval (at 440), then the value of P[1] isdecremented by the value of M, M+=P[1]−P[N], the list P is rotated andP[1] is set to the value of M (at 450). Then, provided the popularitycomputation is continued (at 460) e.g., the popularity computation isnot terminated, the popularity computation algorithm repeats itself.

Initiating Replication

Furthermore, any number of techniques may be used to initiatereplication of an object. An edge server and/or a parent server mightreplicate an object on the first request by a client for the object.Alternatively, the edge server and/or parent server may be tuned to waituntil the edge server and/or parent server receives a specific number orrange of requests for the object. In other implementations, the objectmay be pulled if the object is more popular (e.g., a higher requestrate) than the least popular object currently in the storage. In yetanother alternative, the replicating decision can be a function of thepopularity of the object, the cost of storing the object, the cost ofpulling the object from the network and any other relevant cost factors.However, the popularity of objects may change significantly with time.Initiating a pull decision of an object purely based on a fixedthreshold does not capture this dynamic nature of popularity.

A replication policy that compares against the least popularity ofreplicated objects has its limitations, although the policy does not usea fixed threshold. Consider where the storage is only half full but allthe replicated objects are extremely popular. Since only objectsexceeding the least popularity of the replicated objects will bereplicated under this replication policy, objects with moderatepopularity will be rejected despite that there is plenty of storagespace available and that the objects are reasonably popular.

Accordingly, a replication scheme should be able to automatically adjustthe replication threshold by taking into consideration the dynamicnature of popularity and the fullness of the storage. If there are morepopular objects than the storage capacity allows, the replication schemeshould raise the threshold. If there is more available storage capacity,the replication scheme should decrease the threshold so that moreobjects can be stored.

According to embodiments of a replication scheme and referring to FIG.5, an object is replicated (at 520) into storage when the popularity Pof the object is greater (at 500) than the initial threshold P_(I) andwhen there is enough space (at 510) in the storage to replicate theobject. If there is not enough storage to replicate the requestedobject, a replacement algorithm is performed in which the popularity Pof the object is compared (at 530) against the popularity P_(L) of theleast popular object in the storage. If P is greater than P_(L), thecurrent least popular object is removed (at 540) from the storage tofree up more storage space, the next least popular object is identified(at 540), the value of the least popularity is updated (at 550), and anew iteration begins by checking if there is enough storage space tostore the requested object (at 510). The storage space freeing iterationis terminated when either 1) enough storage space has been freed up toaccommodate the requested object or 2) the requested object is not aspopular as the least popular object in the storage. In embodiments, theleast popular objects are removed from edge servers and, if there areorigin servers with a copy of the least popular objects, from parentservers. Where no origin servers exist with a copy of the least popularobjects, least popular objects are not removed from parent servers inorder to keep a copy of the least popular objects in the network.

Purging

In some embodiments, the managed object replication and delivery methodand system records the time on which an object was last requested. Apurge scheme is invoked to clean up the storage of servers, for example,on a regular time interval basis or when a popular object is replicatedto an edge and/or parent server but there is inadequate space at theedge and/or parent server. Referring to FIG. 6, in the purge scheme, allstale objects are removed from the storage (at 600), the remainingobjects are sorted based on popularity (at 610), and the new values ofP_(L) and P_(I) are determined (at 620, 630). An object is stale if itsage (that is the time since the object was last requested) is over apre-defined value, typically set to the duration of the sliding windowused to measure the popularity multiplied by an adjustable factor. Aswill be apparent to those skilled in the art, the value may vary andindeed other staleness algorithms may be used. The popularity of theleast popular object in the storage after purging is assigned as the newP_(L). The new P_(I) is determined by using the sorted popularity and isset to the popularity of the last object that can fit into the storageif more popular objects are replicated first. Typically, P_(L) should begreater than or equal to P_(I). If not, the value of P_(L) is assignedto be the new P_(I). In some embodiments, the purge process isimplemented as a separate thread in a multi-thread system. Inembodiments, the stale objects are removed from edge servers and, ifthere are origin servers with a copy of the stale objects, from parentservers. Where no origin servers exist with a copy of the stale objects,stale objects are not removed from parent servers in order to keep acopy of the stale objects in the network.

At the outset when the system starts and there is no popularity dataavailable yet, the initial values of both P_(L) and P_(I) can be set tozero. This forces the replication scheme to store the objects on theirfirst request, but the purge scheme that is run on a regular basis willadjust the values of P_(L) and P_(I) automatically. The initial valuesof P_(L) and P_(I) can also be set to other values. Indeed, the initialvalues of P_(L) and P_(I) can be determined by taking into considerationthe cost of storage, the cost of fetching, and the cost difference indeliveries from different servers. In any case, the system allows thespecification of minimum P_(L) and P_(I). If a computed P_(L) or P_(I)is smaller than the minimum specification, P_(L) or P_(I) is set to theminimum specification.

In some embodiments, to avoid or minimize stream thinning and otherquality problems, storage fill is separated from data delivery. In thisway, the data transfer between multiple sets of storages can tolerate aslower connection, and a server never streams an object unless theobject is entirely in the storage. As will be apparent to those skilledin the art, it is possible to start streaming an object when there isenough data in the storage and that replication need not be completedbefore serving the object. Further, storage fill may be staged bycopying to a separate location, then moving the copied data to aservable location when the data is complete.

Further, if an object is changed at an origin server, there may be aneed to broadcast a message to remove the object at one or more parentservers and/or one or more edge servers. Similarly, if an object ischanged at the parent server(s), there may be a need to broadcast amessage to remove the object at one or more edge servers. In each case,future requests for the removed object would be handled as in the normalcase where a requested object is not available at an edge server and/ora parent server.

Hardware and Software

In embodiments of the invention, referring to FIGS. 1 and 7, the systemof managed object replication and delivery comprises one or more serversin a network designated as parent servers and one or more servers in thenetwork designated as edge servers. In some embodiments, referring toFIG. 1, parent servers 120 have large storage capacity (on the order of5 terabytes (TB)) while edge servers 130 have smaller storage space(ranging from 1 TB to 500 GB). One or more redirectors for implementingthe method of managed object replication and delivery are installed oneach edge server cluster. In some embodiments, one or more objects arereplicated to one or more of the parent servers from the origin serversand then pulled from the parent servers to the edge servers as needed.In other embodiments, one or more objects are replicated to one or moreof the edge servers and/or to one or more of the parent servers, fromthe origin servers as needed.

In some embodiments, a data transfer method 700, 710 is implemented totransfer data between parent servers and edge servers. The data transfermethod supports the Transport Layer Security (TLS) protocol (describedin the Internet Engineering Task Force (IETF) RFC 2246, located at“http://www.ietf.org/rfc/rfc2246.txt”, incorporated by reference herein)to ensure communication privacy. Further, the implementation of themethod for managed object replication and delivery supports threepopular object formats, namely Apple Computer, Inc.'s QuickTime™,RealNetworks, Inc.'s Real™, and Microsoft Corporation's WindowsMedia™formats for streaming of requested object(s). As will be apparent tothose skilled in the art, any number of other protocols and objectformats may be used.

Further, in some embodiments, a number of software components are usedto facilitate the method of managed object replication and delivery. Afirst component is a WindowsMedia redirector 720, 760 which is a servicerunning on the Microsoft Windows NT operating system that processesrequests from a Windows Media player and performs the redirection of therequest for Windows Media objects. The WindowsMedia redirector isprovided on edge servers and parent servers. Currently, the MicrosoftMedia Server (MMS) protocol is used for streaming of Windows Mediaobjects and that protocol does not support redirection. To provideredirection for the streaming of Windows Media objects, the uniformresource identifier (URI) hyperlinks at the customer's site for suchstreaming Windows Media objects are modified. URIs as used hereingenerally have the following form (defined in detail in T. Bemers-Lee etal, Uniform Resource Identifiers (URI), IETF RFC 2396, August 1998,located at “http://www.ietf.org/rfc/rfc2396.txt”, incorporated byreference herein): scheme://host[port]/uri-path

where “scheme” can be a symbol such as “http” (see Hypertext TransferProtocol—HTTP/1.1, IETF RFC 2616, located at“http://www.ietf.org/rfc/rfc2616.txt”, incorporated by reference herein)for an object on a Web server or “rtsp” (see Real Time StreamingProtocol (RTSP), IETF RFC 2326, located at“http://www.ietf.org/rfc/rfc2326.txt”, incorporated by reference herein)for an object on a streaming server. Other schemes can also be used andnew schemes may be added in the future. The port number “port” isoptional, the system substituting a default port number (depending onthe scheme) if none is provided. The “host” field maps to a particularnetwork address for a particular computer. The “uri-path” is relative tothe computer specified in the “host” field. An uri-path is typically,but not necessarily, the path-name of a file in a media serverdirectory. In a preferred embodiment, the HTTP protocol is used toeffect the redirection of WindowsMedia objects. Therefore, the “scheme”field of the URIs of the WindowsMedia objects is changed from “mms” to“http”. For example, the URI for a sample object “sample.asf” in theWindows Media Advanced Streaming Format (ASF) will have a new URI of theform “http://host/path/sample.asf”. For objects using Windows Media ASXscripting, a sample URI for the “meta.asx” object will be in the form“http://host/?www.customer.com/path/meta.asx”, where “customer” is thename of the content provider of “meta.asx”. All URIs contained withinthe “meta.asx” object remain unchanged. Upon receiving the request“http://host/path/sample.asf”, the WindowsMedia redirector would respondto the request with the following example ASX script: <ASX version =“3.0”> <Entry><Ref href= “mms://servername/path/sample.asf” /></Entry></ASX>

in the message body, if the requested object is found available eitherlocally or on another server (parent or origin). In this example,“servername” is or resolves to the Internet Protocol (IP) address of amedia server that will serve the requested object to the requestingclient. If the requested object cannot be found, the WindowsMediaredirector would respond to the request with the following example ASXscript: <ASX version = “3.0”> <Entry><Ref href=“http://redirname/path/sample.asf” /></Entry> </ASX>in the message body, where “redirname” is or resolves to the IP addressof the redirector of a parent server, to trigger another round ofredirection. A final round of redirection is reached when none of theparent servers (and the origin server, if applicable) has the requestedobject. In this case, the redirection process is terminated, and a “notfound” error message is sent to the requesting client. Requests for ASXobjects are processed in a similar way. Upon receiving the request forthe sample object “meta.asx”, the WindowsMedia redirector checks theavailability of the object pointed to by each URI inside “meta.asx” andrewrites the URI of each object accordingly. Then the WindowsMediaredirector sends a response to the request with the rewritten “meta.asx”in the message body of the response. The URI rewriting is done asfollows. If a requested object, for example, “file.asf”, is foundavailable locally or on another server, the corresponding URI would berewritten to “mms://servername/path/file.asf”, where “servername” is orresolves to the IP address of the media server that will serve therequested object to the requesting client. If “file.asf” cannot befound, the corresponding URI is rewritten to“http://redirectorname/path/file.asf”, where “redirname” is or resolvesto the IP address of a parent server redirector.

Another component is a Real/QuickTime redirector 730, 770 which is anapplication that processes Real-Time Streaming Protocol (RTSP) requestsfrom a Real or QuickTime player for one or more objects and performs theredirection of the method for Real and QuickTime objects. TheReal/QuickTime redirector is provided on edge servers and parentservers. The RTSP, described in detail in the IETF RFC 2326, is used forstreaming Real and QuickTime objects, and the “REDIRECT” methodsupported in that protocol is used to effect redirection. A redirectrequest informs the client that it must reconnect to another serverlocation and provides for the client the URI of that new server in theredirect request.

A best or optimal server selection mechanism is also provided (not shownin FIG. 7). The best or optimal server selection mechanism includesselection of an edge server most suitable for delivery of one or moreobjects to the client according to any number of currently known orfuture developed algorithms. In addition to redirection to a best oroptimal edge server for handling a client request for an object, thebest or optimal server mechanism may also be applied to trigger one ormore further redirections to one or more parent server(s) when arequested object is not available at the handling edge server. In animplementation, to effect this operation, the hostname part of the URIfor a requested object is modified. For example, in the link“http://customer-wm.fpondemand.net/customer/sample.asf”,“customer-wm.fpondemand.net” would be changed to“parent-wm.fpondemand.net” forcing the request to go through a furtherround of best or optimal server selection against parent servers only.In such embodiments, to effect best or optimal parent server selection,the parent-edge server topology is defined and the best or optimalserver selection mechanism is provided a parent server table definingthe relationships of such a topology. In some embodiments, the best oroptimal server selection mechanism is similar to the best repeaterselector described in U.S. Pat. No. 6,185,598.

A file replication manager application 740, 750 is also provided thatmanages object replication to and object removal from storage, retrievesobjects from parent servers for replication to edge server storage, andperforms storage cleanup as needed. The file replication manager isprovided on edge servers and parent servers. In some embodiments, thefile replication manager application uses the data transfer method andis in communication with the WindowsMedia and Real/QuickTime redirectorsto provide, if available in the storage, objects requested by thoseredirectors.

In some embodiments, the message communicated between a WindowsMedia ora Real/QuickTime redirector and a file replication manager and betweenfile replication managers is encapsulated using the User DatagramProtocol (UDP). This allows address handling and delivery to be handledby UDP and facilitates fast communication. Since UDP does not guaranteedelivery, the message header contains a message number to be used toconfirm that a response is to the current query, and not to a previousquery. In addition, MD5 (See, e.g., Rivest, R., “The MD5 Message DigestAlgorithm”, IETF RFC 1321, April 1992) is supported to provide a basiclevel of security. The MD5 hash is generated by running a MD5 hashalgorithm on the message number, message, and a secret pass phrase onlyshared by components of the system of managed object replication anddelivery. When a message is received, the MD5 hash of the messagenumber, message, and secret pass phrase, is computed and comparedagainst the MD5 hash provided in the message. If these two MD5 hashes donot match, the message is invalid, and will be discarded.

As will be apparent to those skilled in the art, FIG. 7 represents onlysome embodiments of the system according to the present invention. Manyvariations for implementing the system according to the teachings of thepresent invention are possible and are all within the scope of theinvention.

Chunking

An extension of the above method and system is to provide chunking.Studies of log data show that, even for popular objects, a goodpercentage of requests for such objects exit before the object iscompletely served. To exploit this kind of object usage and furtherenhance the performance of the network, objects can be segmented intochunks and initial chunks of an object can be given preferentialtreatment in the replication scheme. For example, only the initialchunks of a object are replicated when a replication admission decisionis made and the remaining chunks of the object are pulled to the storageonly if the client does not exit before a certain amount or number(e.g., 90%) of the initial chunks of the object are served. The initialchunks of an object can be left in the storage even when the objectbecomes unpopular. By partitioning streams in this manner, a first partof an object can be served from edge servers quickly, even if most ofthe object stream must be fetched from a parent server or origin server.

Object Retention and Staleness

Optionally, some or all of the objects may be permanently retained inedge server storage or be retained depending on a quota. Similarly, aconfigurable or automatically adjusting threshold for storage fillingand deletion may be provided.

Also, an edge server may be configured to determine whether a requestedobject in a server's storage is fresh and serve the requested objectonly when the object is not stale. In some embodiments, a file ismaintained which lists the maximum storage age and storage quota inorder to facilitate determining whether a requested object is fresh. Ifa request is received for a stale object a redirect is initiated to therelevant parent server or origin server to provide the requested objectand a storage refresh will be performed if the requested object ispopular.

Peers

Also, edge server storage fills of objects may be served by other peeredge servers instead of a relevant parent server or origin server. If apopular object has already been replicated to an edge server filling anew edge server request for that object from one of the peer edgeservers may be more efficient than the parent server or origin server.Since there are typically more edge servers than parent servers andorigin servers, there is an increased likelihood that a peer edge servermay be closer in terms of network distance than a relevant parent serveror origin server. Moreover, such peer edge server storage fills couldalso lessen the burden on the parent servers or origin servers.

The detailed descriptions may have been presented in terms of programprocedures executed on a computer or network of computers. Theseprocedural descriptions and representations are the means used by thoseskilled in the art to most effectively convey the substance of theirwork to others skilled in the art. The embodiments of the invention maybe implemented as apparent to those skilled in the art in hardware orsoftware, or any combination thereof. The actual software code orhardware used to implement the invention is not limiting of theinvention. Thus, the operation and behavior of the embodiments oftenwill be described without specific reference to the actual software codeor hardware components. The absence of such specific references isfeasible because it is clearly understood that artisans of ordinaryskill would be able to design software and hardware to implement theembodiments of the invention based on the description herein with only areasonable effort and without undue experimentation.

A procedure is here, and generally, conceived to be a self-consistentsequence of operations leading to a desired result. These operationscomprise physical manipulations of physical quantities. Usually, thoughnot necessarily, these quantities take the form of electrical ormagnetic signals capable of being stored, transferred, combined,compared, and otherwise manipulated. It proves convenient at times,principally for reasons of common usage, to refer to these signals asbits, values, elements, symbols, characters, terms, numbers, objects,attributes or the like. It should be noted, however, that all of theseand similar terms are to be associated with the appropriate physicalquantities and are merely convenient labels applied to these quantities.

Further, the manipulations performed are often referred to in terms,such as adding or comparing, which are commonly associated with mentaloperations performed by a human operator. No such capability of a humanoperator is necessary, or desirable in most cases, in any of theoperations of the invention described herein; the operations are machineoperations. Useful machines for performing the operations of theinvention include general purpose digital computers, special purposecomputers or similar devices.

Each operation of the method may be executed on any general computer,such as a mainframe computer, personal computer or the like and pursuantto one or more, or a part of one or more, program modules or objectsgenerated from any programming language, such as C++, Perl, Java™,Fortran, etc. And still further, each operation, or a file, module,object or the like implementing each operation, may be executed byspecial purpose hardware or a circuit module designed for that purpose.For example, the invention may be implemented as a firmware programloaded into non-volatile storage or a software program loaded from orinto a data storage medium as machine-readable code, such code beinginstructions executable by an array of logic elements such as aprocessor or other digital signal processing unit. Any data handled insuch processing or created as a result of such processing can be storedin any memory as is conventional in the art. By way of example, suchdata may be stored in a temporary memory, such as in the RAM of a givencomputer system or subsystem. In addition, or in the alternative, suchdata may be stored in longer-term storage devices, for example, magneticdisks, rewritable optical disks, and so on.

In the case of diagrams depicted herein, they are provided by way ofexample. There may be variations to these diagrams or the operationsdescribed herein without departing from the spirit of the invention. Forinstance, in certain cases, the operations may be performed in differingorder, or operations may be added, deleted or modified.

Embodiments of the invention may be implemented as an article ofmanufacture comprising a computer usable medium having computer readableprogram code means therein for executing the method operations of theinvention, a program storage device readable by a machine, tangiblyembodying a program of instructions executable by a machine to performthe method operations of the invention, or a computer program product.Such an article of manufacture, program storage device or computerprogram product may include, but is not limited to, CD-ROM, CD-R, CD-RW,diskettes, tapes, hard drives, computer system memory (e.g., RAM orROM), and/or the electronic, magnetic, optical, biological or othersimilar embodiments of the program (including, but not limited to, acarrier wave modulated, or otherwise manipulated, to convey instructionsthat can be read, demodulated/decoded and executed by a computer).Indeed, the article of manufacture, program storage device or computerprogram product may include any solid or fluid transmission medium,whether magnetic, biological, optical, or the like, for storing ortransmitting signals readable by a machine for controlling the operationof a general or special purpose computer according to any or all methodsof the invention and/or to structure its components in accordance with asystem of the invention.

Embodiments of the invention may also be implemented in a system. Asystem may comprise a computer that includes a processor and a memorydevice and optionally, a storage device, an output device such as avideo display and/or an input device such as a keyboard or computermouse. Moreover, a system may comprise an interconnected network ofcomputers. Computers may equally be in stand-alone form (such as thetraditional desktop personal computer) or integrated into anotherapparatus (such as a cellular telephone).

The system may be specially constructed for the required purposes toperform, for example, the method of the invention or the system maycomprise one or more general purpose computers as selectively activatedor reconfigured by a computer program in accordance with the teachingsherein stored in the computer(s). The system could also be implementedin whole or in part as a hard-wired circuit or as a circuitconfiguration fabricated into an application-specific integratedcircuit. The invention presented herein is not inherently related to aparticular computer system or other apparatus. The required structurefor a variety of these systems will appear from the description given.

While this invention has been described in relation to certainembodiments, it will be understood by those skilled in the art thatother embodiments according to the generic principles disclosed herein,modifications to the disclosed embodiments and changes in the details ofconstruction, arrangement of parts, compositions, processes, structuresand materials selection all may be made without departing from thespirit and scope of the invention Changes, including equivalentstructures, acts, materials, etc., may be made, within the purview ofthe appended claims, without departing from the scope and spirit of theinvention in its aspects. Thus, it should be understood that the abovedescribed embodiments have been provided by way of example rather thanas a limitation of the invention and that the specification anddrawing(s) are, accordingly, to be regarded in an illustrative ratherthan a restrictive sense. As such, the invention is not intended to belimited to the embodiments shown above but rather is to be accorded thewidest scope consistent with the principles and novel features disclosedin any fashion herein.

1. A method in a distributed computing environment having a plurality ofedge servers and at least one parent server, wherein the edge serversare arranged in hierarchical fashion relative to the at least one parentserver, the plurality of edge content servers and the at least oneparent server forming a content delivery network (CDN), the methodcomprising: directing a request by a client for an object to a firstedge server in the CDN; and if a copy of the requested object is storedon the first edge server, serving the requested object to the clientfrom the first edge server; and, if a copy of the requested object isnot stored on the first edge server, directing the client to a parentserver in the CDN associated with the first edge server for delivery ofthe requested object therefrom and determining whether to replicate therequested object on the first edge server for use in serving futureclient requests based on a measure of popularity of the requestedobject.
 2. A method as recited in claim 1, further comprising:replicating the requested object on the first edge server when themeasure of popularity of the requested object exceeds a popularitythreshold.
 3. A method as recited in claim 2, wherein the measure ofpopularity is defined based on a total number of client requests for theobject received at the first edge server.
 4. A method as recited inclaim 1, wherein if the parent server associated with the first edgeserver does not have a copy of the requested object, directing theclient to another server in the CDN.
 5. A method as recited in claim 4,wherein the step of directing is repeated for a plurality of otherservers in the CDN.
 6. A method as recited in claim 4, wherein the otherserver to which the client is directed if the parent server associatedwith the first edge server does not have a copy of the requested objectis another parent server in the CDN.
 7. A method as recited in claim 1,wherein the distributed computing environment further comprises at leastone origin server, the method further comprising: if a copy of therequested object is not stored on the parent server associated with thefirst edge server, directing the client to at least one origin serverfor delivery of the requested object therefrom and determining whetherto replicate the requested object on the parent server associated withthe first edge server for use in serving future client requests based ona measure of popularity of the requested object.
 8. A method as recitedin claim 1, wherein the measure of popularity of the requested object isa dynamic measure of popularity.
 9. A method as recited in claim 1,wherein the requested object is a streaming media object.
 10. A method,in a framework in which multiple resources of multiple content providersare delivered to multiple end user clients via a shared content deliverynetwork (CDN) formed of a plurality of CDN server sites, each serversite comprising one or more servers, the method comprising: causing aclient request for a resource to be directed to a first server site insaid CDN; if the first server site has a copy of the requested resource,then serving the requested resource to the client from the first serversite; otherwise, if the first server site does not have a copy of therequested resource, determining a measure of popularity of the requestedresource relative to the first server site and replicating the requestedresource to the first server site if the determined measure ofpopularity meets or exceeds a popularity threshold associated with therequested resource.
 11. A method as recited in claim 10, furthercomprising: if the first server site does not have a copy of therequested resource, causing the client to be directed to a second serversite in the CDN.
 12. A method as recited in claim 11, furthercomprising: attempting to serve the requested resource to the clientfrom the second server site in the CDN.
 13. A method as recited in claim10, wherein the measure of popularity of the requested resource is adynamic measure of popularity.
 14. A method, in a framework in whichmultiple resources of a content provider are delivered to multiple enduser clients via a content delivery network (CDN) formed of a pluralityof CDN server sites, each server site comprising one or more servers,wherein at least some of the CDN server sites are edge server sites eachcomprising one or more edge servers, and wherein at least some others ofthe CDN sites are parent server sites each comprising one or more parentservers, the method comprising: (A) responsive to a client request for aresource at an edge server site in said CDN; and (B) if the edge serversite has a copy of the requested resource, then serving the requestedresource to the client from the edge server site; otherwise, (C)replicating a copy of the requested resource on the edge server site ifa measure of popularity of the requested resource exceeds a popularitythreshold associated with the requested object, otherwise notreplicating the requested resource on the edge server site.
 15. A methodas recited in claim 13, further comprising: (D) if the edge server sitedoes not have a copy of the requested resource, then attempting to servethe requested object to the client from a second server site in the CDN.16. A method as recited in claim 15, wherein the second server site inthe CDN has a copy of the requested resource.
 17. A method as recited inclaim 13, wherein the measure of popularity of the requested resource isa dynamic measure of popularity.
 18. A method, in a framework in whichmultiple resources of a content provider are delivered to multiple enduser clients via a content delivery network (CDN), the CDN being formedof a plurality of edge servers and at least one parent server, whereinthe edge servers are arranged in hierarchical fashion relative to the atleast one parent server, the method comprising: responsive to a clientrequest for a resource at a first edge server in said CDN, determiningif a copy of the requested resource is stored on the first edge serverand, if not, then replicating a copy of the requested resource on thefirst edge server if a measure of popularity of the requested resourceexceeds a popularity threshold associated with the requested object. 19.A method as recited in claim 18, wherein if the first edge server doesnot have a copy of the requested resource, the method comprises:attempting to serve the requested resource to the client from anotherserver in the CDN, distinct from the first edge server.
 20. A method asrecited in claim 18, further comprising: directing the client to theother server in the CDN for delivery of the requested resourcetherefrom.
 21. A method as recited in claim 20, wherein the other serverin the CDN is a parent server to which the first edge server isassociated.
 22. A method as recited in claim 21, wherein the frameworkfurther comprises at least one origin server associated with the contentprovider, the method further comprising: if a copy of the requestedobject is not stored on the parent server associated with the first edgeserver, directing the client to at least one origin server for deliveryof the requested object therefrom and determining whether to replicatethe requested object on the parent server associated with the first edgeserver for use in serving future client requests based on a measure ofpopularity of the requested object.
 23. A content delivery methodcomprising: in response to a client request at a first server in acontent delivery network (CDN), said request being for an object, saidclient request having been directed to said first server regardless ofwhether the first server has the requested object, when the first serverdoes not have a copy of the requested object, determining whether toreplicate the requested object on the first server based at least inpart on comparing a measure of popularity of the requested objectagainst a popularity threshold associated with the requested object. 24.A method as recited in claim 23, further comprising: replicating therequested object on the first server when the measure of popularity ofthe requested object exceeds the popularity threshold.
 25. A method asrecited in claim 24, wherein the measure of popularity is defined basedon a total number of client requests for the object received at thefirst server.